NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

Ge, Yunhao; Yu, Hong-Xing; Zhao, Cheng; Guo, Yuliang; Huang, Xinyu; Ren, Liu; Itti, Laurent; Wu, Jiajun (December 2023, NeurIPS 2023)

A major challenge in monocular 3D object detection is the limited diversity and quantity of objects in real datasets. While augmenting real scenes with virtual objects holds promise to improve both the diversity and quantity of the objects, it remains elusive due to the lack of an effective 3D object insertion method in complex real captured scenes. In this work, we study augmenting complex real indoor scenes with virtual objects for monocular 3D object detection. The main challenge is to automatically identify plausible physical properties for virtual assets (e.g., locations, appearances, sizes, etc.) in cluttered real scenes. To address this challenge, we propose a physically plausible indoor 3D object insertion approach to automatically copy virtual objects and paste them into real scenes. The resulting objects in scenes have 3D bounding boxes with plausible physical locations and appearances. In particular, our method first identifies physically feasible locations and poses for the inserted objects to prevent collisions with the existing room layout. Subsequently, it estimates spatially-varying illumination for the insertion location, enabling the immersive blending of the virtual objects into the original scene with plausible appearances and cast shadows. We show that our augmentation method significantly improves existing monocular 3D object models and achieves state-of-the-art performance. For the first time, we demonstrate that a physically plausible 3D object insertion, serving as a generative data augmentation technique, can lead to significant improvements for discriminative downstream tasks such as monocular 3D object detection.
more » « less
Full Text Available
DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models

Zhao, Brian_Nlong; Xiao, Yuhang; Xu, Jiashu; Jiang, Xinyang; Yang, Yifan; Li, Dongsheng; Itti, Laurent; Vineet, Vibhav; Ge, Yunhao (December 2023, Arxiv)

The popularization of Text-to-Image (T2I) diffusion mod- els enables the generation of high-quality images from text descriptions. However, generating diverse customized im- ages with reference visual attributes remains challenging. This work focuses on personalizing T2I diffusion models at a more abstract concept or category level, adapting com- monalities from a set of reference images while creating new instances with sufficient variations. We introduce a solution that allows a pretrained T2I diffusion model to learn a set of soft prompts, enabling the generation of novel images by sampling prompts from the learned distri- bution. These prompts offer text-guided editing capabilities and additional flexibility in controlling variation and mix- ing between multiple distributions. We also show the adapt- ability of the learned prompt distribution to other tasks, such as text-to-3D. Finally we demonstrate effectiveness of our approach through quantitative analysis including auto- matic evaluation and human assessment.
more » « less
Full Text Available
3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

Ge, Yunhao; Yu, Hong-Xing; Zhao, Cheng; Guo, Yuliang; Huang, Xinyu; Ren, Liu; Itti, Laurent; Wu, Jiajun (December 2023, Advances in Neural Information Processing Systems (NeurIPS))

Full Text Available
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Ge, Yunhao; Tang, Yihe; Xu, Jiashu; Gokmen, Cem; Li, Chengshu; Ai, Wensi; Martinez, Benjamin Jose; Aydin, Arman; Anvari, Mona; Chakravarthy, Ayush K; et al (June 2024, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Full Text Available
Ferroelectric FET-based context-switching FPGA enabling dynamic reconfiguration for adaptive deep learning machines

https://doi.org/10.1126/sciadv.adk1525

Xu, Yixin; Zhao, Zijian; Xiao, Yi; Yu, Tongguang; Mulaosmanovic, Halid; Kleimaier, Dominik; Duenkel, Stefan; Beyer, Sven; Gong, Xiao; Joshi, Rajiv; et al (January 2024, Science Advances)

Field programmable gate array (FPGA) is widely used in the acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the trade-off between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. Here, we propose a ferroelectric field-effect transistor (FeFET)–based context-switching FPGA supporting dynamic reconfiguration to break this trade-off, enabling loading of arbitrary configuration without interrupting the active configuration execution. Leveraging the intrinsic structure and nonvolatility of FeFETs, compact FPGA primitives are proposed and experimentally verified. The evaluation results show our design shows a 63.0%/74.7% reduction in a look-up table (LUT)/connection block (CB) area and 82.7%/53.6% reduction in CB/switch box power consumption with a minimal penalty in the critical path delay (9.6%). Besides, our design yields significant time savings by 78.7 and 20.3% on average for context-switching and dynamic reconfiguration applications, respectively.
more » « less
Full Text Available
Surprise! Predicting Infant Visual Attention in a Socially Assistive Robot Contingent Learning Paradigm

https://doi.org/10.1109/RO-MAN46459.2019.8956385

Klein, Lauren; Itti, Laurent; Smith, Beth A.; Rosales, Marcelo; Nikolaidis, Stefanos; Matarić; Maja J. (October 2019, Ro-Man 2019- International Conference on Robot and Human Interactive Communication)

Early intervention to address developmental disability in infants has the potential to promote improved outcomes in neurodevelopmental structure and function [1]. Researchers are starting to explore Socially Assistive Robotics (SAR) as a tool for delivering early interventions that are synergistic with and enhance human-administered therapy. For SAR to be effective, the robot must be able to consistently attract the attention of the infant in order to engage the infant in a desired activity. This work presents the analysis of eye gaze tracking data from five 6-8 month old infants interacting with a Nao robot that kicked its leg as a contingent reward for infant leg movement. We evaluate a Bayesian model of lowlevel surprise on video data from the infants’ head-mounted camera and on the timing of robot behaviors as a predictor of infant visual attention. The results demonstrate that over 67% of infant gaze locations were in areas the model evaluated to be more surprising than average. We also present an initial exploration using surprise to predict the extent to which the robot attracts infant visual attention during specific intervals in the study. This work is the first to validate the surprise model on infants; our results indicate the potential for using surprise to inform robot behaviors that attract infant attention during SAR interactions.
more » « less
Full Text Available

Search for: All records